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ACCOUNTABILITY AND ASSESSMENT: 



IS PUBLIC INTEREST IN K-I2 EDUCATION BEING SERVED? 

Joan L. Herman 

CRESST/University of California, Los Angeles 

Abstract 

The reauthorization of No Child Left Behind (NCLB) makes this a good time to eonsider 
whether and how eurrent accountability serves the public interest and whether and how it 
can better do so. This report explores these issues in the context of the current literature 
on the effects of accountability in K-12 education. It considers the meaning of “public 
interest” and offers a model of how public interest may be served through accountability 
to benefit student learning. The report considers how well the model fits available 
evidence by examining whether and how accountability assessment influences students’ 
learning opportunities and the relationship between accountability and learning. 

The Meaning of Public Interest 

What is the public interest? While policy debates, politicians, the media and public 
groups often evoke it, public interest is a slippery concept to define. Reich (1988) speaks of 
transcendent ideas and concerns for the good of society, rather than self-interest, that 
motivate political action. Moyers (2007) notes that the proposition that each of us has the 
right to “life, liberty, and the pursuit of happiness” is the foundation of this country and that 
this proposition carries with it the imperative that members of society have “obligations to 
each other, mutually and through their government, to ensure that conditions exist enabling 
every person to have the opportunity for success in life.” But as Hochschild and Scovronick 
(2004) have observed, in the context of public schooling, the proposition blends both 
collective and individual responsibilities, and contains inherent conflicts between policies 
designed for the good of ALL students and those designed to enable individuals to succeed, 
particularly the privileged of society. 

Different perspectives on what constitutes the public interest and the policies that can 
promote it grow out of differing ideals and the conflicts among them, varying definitions of 
basic societal goals such as liberty and equality, and different analyses of the sources of 
problems and obstacles (Stone, 1998). What constitutes the public interest is an interaction 
between the facts as one sees them and one’s values. For example, some see the success of 
Department of Defense Schools as support for integration, high academic expectations, 
shared decision-making and investment in professional development for educators; others see 
it ratifying their ideas about the importance of home culture and discipline. 
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Whose and how many individuals’ interests need to be served? How should a poliey be 
designed to address the publie interest? These remain to be open questions. Do all or nearly 
all need to be served? To what end? Is an aetion that serves some but hurts none in the publie 
interest? What of polieies that serve the many, but hurt the few? And how many are “many” 
and how “few” ean few be? While these may be unanswerable questions, they refleet 
tensions that need to be balaneed in any diseussion of whether and how eurrent 
aeeountability and assessment systems may serve the publie interest to benefit (or not) the 
edueation and learning of K-12 students. 

Do current accountability systems serve all students? Certainly if action in the public 
interest means serving the needs of those who otherwise would be left unsatisfied, then to be 
considered in the public interest, accountability must benefit students who traditionally have 
been under-served — economically disadvantaged students, English learners and diverse 
students of color. Yet if all students are to be served, then the system also needs to benefit — 
or at least not hurt — students who have traditionally been higher achieving, including our 
highest ability students. (As we shall see, however, it is difficult to design a single test and 
system that well serves students at different points of the distribution.) Furthermore, 
consideration of public interest must address long term and unanticipated side effects as well 
as immediate effects. Accountability that promotes attention to the short term, bottom line of 
student performance must yield long-term benefits for student learning and for public 
education. 

The concept of public interest also brings with it a basic concern with social ends and 
goals. If we are an accountable society — as citizens, as a body politic responsible for 
others — what should we and education be held accountable for in terms of student learning? 
Recent commissions have raised questions again about whether schools are sufficiently 
preparing students for creative thinking and problem solving and in science and technology 
for this country to keep its competitive edge (Partnership for 21st Century Skills, 2004; 
Friedman, 2005). Furthermore, in the rush to reach consensus on the meaning of proficiency 
in reading and mathematics, we seem to have skipped over the dialog and potential 
disagreements on the goals of schooling (Ramaley, 2005) as well as having settled for 
standards that fall short of clearly articulating the academic knowledge and skills that 
students will need for future success (Wilson & Berenthal, 2005). Democracy carries with it 
the responsibility to help create citizens who will recognize and serve the public good, not 
only their own interests (Parker, 2003). The public too apparently wants schools that promote 
self-discipline and social responsibility (Mathews, 2006). But schools currently seem 
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overwhelmed by the need to raise test seores and meet aeademie mandates, and publie 
interest goals seem to be beyond eurrent, offieial standards and expeetations for sehooling. 

The Role of Accountability in Serving the Public Interest: A General Model 

Merriam- Webster’s Dietionary defines aeeountability as “the quality or state of being 
aeeountable; especially: an obligation or willingness to aeeept responsibility or to aeeount for 
one’s aetions.” In eurrent edueational eontexts, the eoneept earries with it the idea that 
individuals, organizations and the eommunity not only are responsible for their aetions, but 
must also answer for their performanee to an outside authority that, in turn, may impose a 
penalty for failure. Sehools and students are responsible for teaehing and meeting learning 
goals — no exeuses, no blame game, no vietimhood, and under No Child Left Behind, there 
are serious sanetions for distriets, sehools and teaehers failing to meet those goals. In the 
simplest sense, students eome to sehool to learn, sehools and the edueators within them exist 
to teaeh and to promote student learning. Sinee tests show whieh students and what sehools 
are meeting or exeeeding standards and those that are not, students and teaehers who are 
falling short should be held aeeountable for their failure (and less frequently, those who 
sueeeed beyond expeetations should be rewarded for their sueeess). 

While this is a basie view of bureaueratie aeeountability, Darling-Hammond (2006) 
notes the importanee of professional and eapaeity building forms of aeeountability as well. 
At its eore the broader eoneept of aeeountability eontains a strong ethieal and internal 
orientation, a eoneern for the welfare of others, and a eommitment to effieaey. Teaehing has 
been ealled a “ealling’’ as well as an oeeupation, and elearly most teaehers are eommitted to 
their students’ learning — and get satisfaetion from their own effieaey — independent of 
external ineentives. In faet, motivation researehers long have contrasted internal and external 
motivation and their research suggests that external rewards reduce internal motivation (Deci 
& Ryan, 2000). 

Leaving aside the professional accountability and intrinsic motivational issues for the 
moment, the role that accountability is intended to play in today’s standards-based reform 
seems relatively straightforward and well established. All states except Iowa have established 
standards for what students should know and be able to do. Spurred in part by No Child Left 
Behind Act (P.L. 107-110, 2001), and Goals 2000: Educate America Act (P.L. 103-227, 
1994) that preceded it, states have created assessments that make explicit for schools and the 
students within them what the standards mean. Pressured by fear of sanctions — and less often 
by rewards — teachers and students are motivated to teach/leam the expected standards and to 
use the information from the assessment to improve their efforts, even as those same 
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assessment results reveal who has sueeeeded in meeting targets or expeetations, and who has 
not. The assessment system thus serves both teohnieally as a performanee measurement 
system that provides feedbaek and as a motivational system that serves a number of soeio- 
politieal or symbolie purposes in: establishing the target for reform efforts; eommunieating to 
edueators, administrators and parent what is expeeted; insisting on high expeetations for all 
students; providing ineentives and/or sanetions; and thereby stimulating all levels of the 
edueation system to foeus on aehieving the NCLB goals for adequate yearly progress (AYP), 
ostensibly assuring that all ehildren will be profieient by the year 2014. 

Figure 1 shows one view of how aeeountability is supposed to work: Aeeountability 
sets the eontext and ereates ineentives for edueational aetion to enable all students to attain 
standards. State standards thus are the foundation on whieh the whole system sits, and the 
theory of aetion assumes that these standards establish elear and important goals for student 
learning. 

For students to attain standards, edueators must take aetion to improve students’ 
learning opportunities (termed OTL in Figure 1, and also known as opportunity to learn) in 
what and how well students are taught in elassrooms, through supplemental serviees and 
programs, and through speeially targeted in- and out-of-sehool aetivities and interventions. 
And these improvements in OTL, in turn, are neeessary preeursors to improvements in 
students’ learning, as indieated by performanee on state tests and other indieators of students’ 
progress toward or attainment of standards. 

Feedbaek from the assessments is used to improve learning opportunities for students in 
terms of targeting instruetion on areas of need and evaluating and refining edueational 
programs, materials and strategies to inerease students’ attainment of standards. Beeause 
NCLB requires that every subgroup of students within the sehool attain established adequate 
yearly progress targets, all students must be provided with effeetive learning opportunities, 
ineluding whatever augmented programs and speeial serviees that traditionally low aehieving 
students (ehildren of poverty, English learners and student with disabilities [SWD]) may 
need to attain sueeess. 
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Figure 1. Accountability Model. 
Note. OTL = Opportunity to learn. 



Surely, however, improving student learning is not only a bureaucratic and 
management problem. Darling-Hammond (2006) notes the importance of professional and 
capacity-building forms of accountability as well. Recent research on the power of formative 
classroom assessment underscores these sources of accountability in that it shows not only 
the value of on-going assessment relative to accountability assessment but also supports the 
benefits of reflective, professional practice (Black & Wiliam, 1998). 

While Figure 1 focuses on the impact of accountability on students’ opportunity to 
learn, the underlying theory of action assumes that the federal government, states, districts, 
and schools will be accountable for assuring the that there are sufficient talent, resources, 
policies and practices at all levels of the educational system and that these will be 
coordinated and integrated to support teaching and learning. Moreover, policymakers and 
actors and these levels are expected to use the feedback from state assessments for 
management and improvement purposes: to gauge their strengths and weaknesses: to identify 
students, schools, and classrooms that may need special help; and to be strategic in taking 
action and coordinating available resources to improve student performance, e.g. through 
professional development, instructional materials, mentoring, and technical assistance. 
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This simplified theory provides a starting point for examining whether and how 
aeeountability is serving the publie interest. Despite intraetable problems in attributing 
causality and the innumerable other factors in state and local policy and practice that have an 
influence, I ask: Do standards provide a sound foundation for accountability? What is the 
evidence accountability and assessment are improving students’ opportunities to learn? Is 
there evidence that accountability is promoting student learning and attainment of standards? 
For whom? 

Quality of Standards 

As noted above, high quality standards are the foundation on which the whole 
enterprise of standards-based accountability and reform rests, as standards supposedly 
provide the reference point for all action. Recent reviews, however, raise questions about the 
quality of that foundation. For example, Finn, Petrilli and Julian (2006) reviewed state 
standards in English-Language Arts, Mathematics, History and Science, based on their clarity 
in communicating what students ought to know and be able to do, their academic rigor, and 
their attention to the most important knowledge in the discipline. Summarizing ratings across 
subjects and review criteria using an A-F grading scale, the researchers report that states on 
average score a C-minus, although a few states were regarded as exemplary. 

Similarly, the National Research Council’s (NRC) review of science standards across 
the states (Wilson & Berenthal, 2005) found great variety in the kinds of knowledge 
privileged — states tended to focus on lower level declarative and procedural knowledge 
(define, know, describe), while some also attended to higher level schematic and strategic 
knowledge (predict, justify, compare, analyze, explain). NRC also found great variety in the 
scope of content addressed, how broadly or specifically content was defined, and found most 
states unrealistic in the number of learning goals that feasibly were possible to attain over the 
course of a year(s), highly variable in attention to the most important science content, and 
vague in defining performance expectations. No state’s standards meet the committee’s 
criteria of: 

• Clear, detailed and complete 

• Reasonable in scope 

• Rigorous and scientifically accurate 

• Based on sound models of learning 

• Describe performance expectations and identify proficiency levels 
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Without a clear and realistic target to aim for, educators tend to rely on tests to define 
expectations. Absent rigorous, important and accurate content, standards provide a faulty 
foundation for assessment and instruction and can focus the educational system on trivial and 
superficial learning. With these caveats in mind, I turn next to evidence on the effects of 
accountability on students’ learning opportunities. 

Effects on Learning Opportunities 

Relying largely on survey, interviews, and observation data from teachers and school 
administrators, substantial research over the last decade has shown a consistent picture of the 
effects of state-level accountability testing on curriculum and teaching, see for example: 
Arizona (Smith & Rottenberg, 1991), California (Hamilton, et ah, 2007; Gross & Goertz, 
2005; McDonnell & Choisser, 1997), Florida (Gross & Goertz, 2005), Georgia (Hamilton, et 
ah, 2007) Kentucky (Koretz, Barron, Mitchell, & Stecher, 1996; Stecher, Barron, Kaganoff, 
& Goodwin, 1998; Borko & Elliott, 1998; Wolf & Mclver, 1999), Maine (Firestone, 
Mayrowetz, & Fairman, 1998), Maryland (Lane, Stone, Parke, Hansen, & Cerrillo, 2000; 
Firestone et ah, 1998; Goldberg & Rosewell, 2000), Michigan (Gross & Goertz, 2005), New 
Jersey (Firestone, Camilli, Yurecko, Monfds, & Mayrowetz, 2000), North Carolina (Gross & 
Goertz, 2005; McDonnell & Choisser, 1997), New York (Gross & Goertz, 2005), 
Pennsylvania (Hamilton, et ah, 2007; Gross & Goertz, 2005), Vermont (Koretz, McCaffrey, 
Klein, Bell, & Stecher, 1993), and Washington (Stecher, Barron, Chun, & Ross, 2000; Borko 
& Stecher, 2001). 

State assessments focus instruction. Research and practical experience show that 
teachers and principals indeed pay attention to what is tested and adapt their curriculum and 
teaching accordingly. Principals, sometimes with and sometimes without the involvement of 
their staff, analyze test results and develop school plans to concentrate on areas where test 
results show a need for improvement. Research shows that almost all principals also take 
action to assure that teachers engage their students in direct test preparation. Teachers 
eonsistently report that state tests have a substantial effect on the content they teach and how 
they assess student learning. 

Teachers model what is assessed. When many states developed performance 
assessments in the early 1990s, many classroom teachers revised their instruction and 
classroom assessment accordingly. Teachers in fact, scrambled to replace their own multiple- 
ehoice tests with the same types of open-ended items and/or extended writing questions that 
state tests had begun to use. In the middle 1990’s, when states largely moved back to 
multiple-choice and short-answer formats, teachers’ practice and assessment also reverted to 
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multiple choice, vocabulary lists, and the like. More recently, Hamilton, et al. (2007), in 
examining the effects of NCLB in California, Georgia, and Pennsylvania, found more 
mathematics teachers in Pennsylvania, as compared to the other two states, reporting open- 
ended tests in their classrooms as a result of their state assessment. The researchers attributed 
the difference to the use of open-ended items on Pennsylvania’s math assessment. In other 
words, change the test and instruction follows. 

Schools focus on the test rather than the standards. At least initially, educators in 
self-defense pay attention to what is tested and how it is tested, rather than to the underlying 
standards that the tests are supposed to represent. Teachers in Washington, for example, 
reported that their instruction tended to be more like Washington’s state assessment than the 
Washington state standards (Stecher & Borko, 2002). When states like Washington and 
Kentucky tested different topics in different years, researchers found that teachers provided 
more time on particular subjects in the years they were tested than in those they were not 
(Stecher & Barron, 1999; Stecher, Barron, Chun, et ah, 2000). If mathematics was tested in 
fifth grade but not language arts, teachers taught more mathematics and reduced instruction 
in other subjects such as language arts. These changes were not motivated by any coherent 
sense of curriculum nor were they driven by the need to continuously develop students’ 
learning. 

What is not tested becomes invisible. As a corollary, focusing on the test rather than 
the standards also means that what does not get tested tends to get less attention or may be 
ignored all together. This seems true both within and across subjects. For example, if 
extended math problems are not included on the math test, instructional time may go to 
computation or other problem types that are on the test. Similarly, as more time goes to the 
tested subjects — typically reading, language arts and mathematics — this time must come 
from other areas of the curriculum. 

Cnrricnlnm and instrnction are aligned. Across the board, districts and schools have 
made efforts to align curriculum and instruction with standards. This is particularly true for 
schools failing to meet their targets and identified as underperforming. For example, in the 
national Evaluation of Title I Accountability Systems and School Improvement Efforts, 
Shields et al. (2004) found that that 80% of the sampled schools were actively working to 
align their curricula with standards and assessment, and many were also implementing new 
curricula in reading/language arts and mathematics. Similarly, the Consortium for Policy 
Research in Education’s (CPRE) study of designated, underperforming high schools 
uniformly shows schools concentrating on aligning curriculum and instruction with 




assessments — through revisions in their regular eurrieulum, through the addition of new 
eourses, test preparation and remedial and extra-sehool tutoring (Gross & Goertz, 2005). 

More attention to assessment and student data. Shields et al. (2004) also found that 
85% of sehools reported using student aehievement data to target their instruetion, eehoing 
other studies whieh also report sehools using data to identify students who need speeial help 
(Center on Edueation Poliey [CEP], 2006; Hamilton, et ah, 2007). These studies highlight 
that distriets and sehools inereasingly are mandating interim or benehmark assessments 
throughout the year to monitor student progress on expeeted standards and assessments. 
These assessments tend to mimie the eontent and format of state assessments and their 
teehnieal quality is moot (Herman & Baker, 2005). As a result, the use of these assessments 
tends to eneourage teaehers to keep their eyes firmly on student progress, espeeially the 
knowledge and skills that will be tested, and eorrespondingly may heighten eurrieulum 
narrowing to foeus on what is tested, rather than underlying standards. 

At the same time, distriets have beeome more preseriptive about how and what teaehers 
are supposed to teaeh, have moved to eommon instruetional materials and have ereated 
paeing guides detailing what is to be eovered when — even as far as preseribing what text 
book pages teaehers should be eovering on any partieular day. Interestingly, required, rigid 
adherenee to paeing guides leaves little time for going baek and re-teaehing knowledge and 
skills whieh interim tests reveal as weak and ean ereate a diseouraging environment for 
teaehers’ professional praetiee. 

Growing attention to formative assessment. Even so, formative assessment, the use 
of elassroom assessment to inform ongoing teaehing (Wiliam, 2006) shows growing 
popularity. Blaek and Wiliam’s (1998) landmark review showed the potential power of 
formative assessment, and edueators have inereasingly reeognized that they need ongoing 
information about student learning if they are to be aeeountable for results. Yet available 
evidenee suggests that the rhetorie surpasses the reality of formative assessment use: all 
teaehers are inereasingly talking the talk (Herman, Yamashiro, Eefkowitz, & Trusela, 2001; 
Hamilton et ah, 2007) but the studies looking at praetiee eloser up suggest the ehallenges of 
developing teaehers’ eapaeities to engage in valid, formative praetiee (Gearhart et ah, 2006). 

At-risk students face curricular distortions. There also is growing evidenee that 
eurrieulum options are grossly narrowing for low seoring students and underperforming 
sehools. In the eontext of No Child Eeft Behind’ s requirements for annual yearly progress, 
sehools are inereasingly foeusing on reading and mathematies, to the exelusion of seienee, 
soeial studies, and the arts; and at the seeondary level, low performing students are being 
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pulled from academic courses to concentrate on literacy development (Gross and Goertz, 
2005; Greenleaf, Jimenez, & Roller, 2002; Mintrop & Trujillo, 2007). Indeed one recent 
national study shows that 71% of school districts indicated that they have reduced 
instructional time in at least one other subject to make more time for reading and 
mathematics, and in some districts, struggling students receive double periods of reading 
and/or math, missing electives or other subjects (CEP, 2006). Moreover, there is anecdotal 
evidence that literacy intervention programs for the lowest performing students are devoid of 
actual book reading. Instead, students uniformly read excerpts and seek out answers to the 
information-type questions that are expected to be on the state test. At the extreme, there are 
anecdotes as well about schools using “triage” strategies to focus on what they consider 
“pushables” and “slippables” (relative to reaching proficiency) and virtually ignoring both 
students in greatest educational need and overriding issues of improving instructional quality 
(Booher-Jennings, 2006). Such distortions provide counter evidence to the claim that current 
accountability is improving instruction for low performing students and are worrisome as 
well in the context of a Gates Foundation survey indicating that students do not drop out of 
school primarily because they cannot do the work or pass their courses, but more often 
because they are bored by school (Bridgeland, Dilulio, & Morison, 2006). 

So the rhetoric and dominant stories are changing, as are some aspects of practice. 
Across school levels and types of schools, and regardless of the specifics or strength of 
states’ accountability systems or the intensity of their incentives or sanctions, research 
suggests that accountability testing does serve to motivate attention and action and that the 
action so motivated serves to change the alignment of curriculum and instruction with 
standards and assessment and to change students’ opportunities to learn. In some cases, 
changes in curriculum are school wide, and in other cases they are specialized courses or 
services for students identified as at risk. 

But do these changes in instruction and opportunities to learn represent real 
improvements that actually benefit learning for students, and particularly students who are 
most at risk, or do they really impoverish learning opportunities, as critics have charged, 
relegating students to a narrow curriculum of test preparation that is devoid of complex 
thinking and problem solving and devoid of learning in the arts and sciences, as previous 
evidence suggests (National Research Council, 2003; Pelligrino, 2006)? Koretz (2005) has 
conceptualized a number of ways in which schools and teachers respond to the alignment 
challenge, ways which differ dramatically in terms of their potential to improve student 
learning (in contrast to inflating their test scores): from changes in the allocation of time (do 
more of the same), to meaningful alignment of instruction (do something different in 
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curriculum and instruction), to substantive and non-substantive eoaehing or test preparation, 
to eheating. Available evidenee shows more attention to ehanges in the alloeation of time and 
attention to test preparation than to ehanges in the quality or effeetiveness of instruetion. 

Ultimately, the proof of the pudding in whether aeeountability aetually improves 
students’ opportunities to learn may lie in student performanee. That is, if learning 
opportunities are improving, should not sueh improvements be refleeted in student 
performanee? We turn now to this body of evidenee, ineluding studies eondueted prior to 
2001 and prior to NCLB when there was more diversity in state aeeountability systems, and 
post 2001. 

Effects on Performance Prior 2001 

Several studies have used data prior to NCLB from the National Assessment of 
Edueational Progress (NAEP) to study the effeets of aeeountability on student performanee. 
Generally their results have been positive. For example, Grissmer, Flanagan, Kawata and 
Williamson (2000) hypothesized that aeeountability reforms might be responsible for the 
rapid growth, relative to other states, found for North Carolina and Texas for the period 
1990-1996. Similarly, Camoy and Loeb (2004) observed a relationship between the strength 
of a state’s aeeountability system (its eonsequenees for sehools) and gains in the pereentage 
of students seoring at least “basie” on NAEP mathematies assessments 1996-2000, but saw 
no relationship in retention or survival rates. A similar relationship was observed in 
pereentage of White and Afriean Ameriean students seoring at least “profieient” at eighth 
grade. Fourth grade effeets were less elear in that only Afriean Ameriean students showed 
signifieant gains at the basie level. Using slightly different methodologies and different 
strategies for dealing with ehanges in exelusion rates, Braun (2004), Hanushek and Raymond 
(2004), and Rosenshine (2003) eame to similar eonelusions about the relationship between 
NAEP gains and high stakes testing systems: results favored high stakes versus no stakes 
states. 

Student accountability: Ending social promotion. In terms of effeets of 
eonsequenees for students, researeh on the Chieago Publie Sehools “end to soeial promotion’’ 
polieies represents one of the most extended and thorough studies. The authors were eareful 
to note that the soeial promotion reform oeeurred simultaneously with a new aeeountability 
program for the lowest aehieving sehools in the distriet and that the effeets of the two 
programs eould not be disentagled. Overall, Roderiek, Nagaoka, and Allensworth (2005) 
coneluded that the 1996 reforms were related to improvements in middle grades performanee 
in the middle grades that extended into high sehool. However, there were no effeets in the 
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early grades — perhaps an area where student motivation would not otherwise be problematic. 
The details of their findings showed that some students near the cut-off worked harder and 
escaped retention, so the threat of retention helped them. More problematic, however, the 
study found that low achieving students who were retained because of the reform did not 
benefit educationally during the retained year, experienced lower achievement gains in the 
sixth grade than students with similar test scores who were promoted, and based on existing 
research on retention, were at increased risk of dropping out. In short, the most vulnerable 
students did not benefit from this student-level accountability reform. 

Student accountability: High school exit exams. Today’s high school exit exams 
revisit some of these same patterns and echo issues that emerged in response to the minimum 
competency exams of the late 1970s and 1980s. Then, as now, results have shown initial high 
failure rates that decline over time, with large disparities in performance for poor and 
minority students, students with disabilities and English learners (CEP, 2005, 2006; Heubert, 

2004) . California represents a current example: one year before the graduation test 
requirement went into effect, an estimated 78% of the class of 2006 had passed the state’s 
High School Exit examination, leaving nearly 100,000 who had not. By the time of 
graduation in June 2006, an estimated 40,173 students still did not meet the requirement 
(California Department of Education [CDE], July 2006). 

As with the effects of retention, a number of studies have suggested a troubling 
relationship between high school exit exams (or their precursors) and students’ dropping out 
of school. For example, Catterall (1989) early found that students who had failed to pass a 
minimum competency test on their first try, relative to their similar ability peers, were more 
likely to doubt their chances of graduating and to report the possibility of dropping out. 
Subsequently, researchers examining patterns of performance by state found that high school 
enrollment and completion rates generally were lower for economically disadvantaged and/or 
low ability students in states that had such tests compared to states without such tests 
(Reardon 1996; Bishop, Mane, Moriarty and Bishop 2001, Bishop and Mane 2004). At the 
same time, studies found positive effects on subsequent educational success: eighth graders 
in states with high school exit exams were more likely to go to college and equally likely to 
graduate from college, and controlling for high school graduation, likely to get higher-paying 
jobs than their peers in other states (Bishop, Mane, and Moriarty, 2001; Bishop and Mane 

2005) . 

Different effects for different kinds of tests. Moreover, there also is evidence that the 
effects of high school exit exams may be different for different types of tests. Bishop (2005) 
shared empirical data from a variety of sources to argue that rigorous, course-based exit 
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examinations, such as those used Europe and introduced in North Carolina and New York, 
benefit student achievement (National Assessment of Educational Progress [NAEP], 2005) 
substantially more so than more typical minimum competency tests, and these positive 
effects are achieved without any increase in drop out rates. Similarly, Darling-Hammond, 
Rustique-Forrester, and Pecheone (2005) used evidence from NAEP to show that while states 
that use a single exit exam for high graduation show higher dropout rates, particularly for 
African American students, Eatino students, students with disabilities, and English learners, 
those states that use a multiple measures approach and consider a variety of student work in 
making graduation decisions have tended to maintain high achievement and high graduation 
rates. The nature of the accountability and assessment system apparently matters, and there 
are existence proofs for engendering higher performance without the unacceptable costs 
associated with higher dropout rates. 

Performance Effects Since 2001 

What of more recent effects of accountability? Education Week released 10 years of 
Quality Counts data in 2006, which has monitored state progress in adopting core elements 
of standards-based reform, including establishing academic standards, aligning assessment 
with those standards, implementing accountability measures, and providing supports for 
improving teacher quality. Quality Counts indicators show increases since 1997 in the 
implementation of policies in all of these areas across states, although the trajectories of 
individual states vary considerably (Swanson, 2006). 

National trends on NAEP document a similarly positive trajectory over a similar time 
period, showing definite if not modest increases from 2000 to 2005 in mathematics at Grades 
4 and 8 and some improvement for Grade 4 reading; Grade 8 reading, however, shows a 
decline. Furthermore, while there has been some reduction in the achievement gap during the 
10-year period, substantial differences persist, and there has been little or no reduction since 
NCEB (Fee, 2006). 

Looking at the relationship between states’ changes in standards-based policy 
implementation and their progress on NAEP, Quality Counts shows a consistently positive — 
though again, modest — relationship for policies related to academic content standards, 
aligned assessment, and accountability measures, particularly for mathematics. Oddly, 
however, the implementation of policies related to teacher quality negatively correlated with 
performance. 

In an effort to take a closer look at what assessment and accountability system 
characteristics might be related to state performance, we tried to use available data to validate 
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existing quality indicators and to identify states that were over- and under-performing based 
on NAEP results. While recognizing the multitude of variables and system levels that could 
influence student learning, we speculated that if accountability was supposed to be a strong 
intervention, the quality of standards and assessments and the nature of the accountability 
system might make a difference for student performance. States that communicated strong 
expectations with clear standards and backed them up with rigorous assessments of high 
technical quality that included multiple measures and in turn could provide accurate 
feedback, we thought, might do better than states whose systems lacked these features. At the 
same time, we thought that we might be able to find differences in the assessment and 
accountability systems of states that showed exceptional performance on NAEP, compared to 
those that did not. Beyond the obvious caveats related to any analysis, we encountered 
conundrums in exploring with avenue. 

What states are achieving well? A first challenge was the identification of over- and 
under-performing states, using NAEP as the common, comparable measure across states. We 
reasoned that such classifications should be based on both the status of student performance 
and progress in student performance. We thus identified states whose performance, 
controlling for socioeconomic status (SES), was better than expected across the two recent 
NAEP administrations, 2001 and 2003, in reading and/or mathematics across both Grades 4 
and 8 (regression analysis using concentrations of economically disadvantaged students and 
state NAEP results, (drawn from School Matters, 2005). Massachusetts and New York stand 
out uniquely from this analysis as outperforming states with similar SES in reading at Grades 
4 and 8 and mathematics at Grade 8 — that is, for three of the four NAEP assessments. South 
Carolina, Kansas and Minnesota show better than expected performance in mathematics, for 
both Grades 4 and 8, and Kentucky for reading, in Grades 4 and 8. 

However, in moving to the identification of states whose improvements in performance 
were outstanding relative to other states, defined as states showing at least 1 standard 
deviation above the mean state gain from 2001 to 2003, the consistent performers generally 
are different. Massachusetts was the exception, showing exceptional improvement' for three 
of four possible assessments — reading at Grade 4 and mathematics at Grades 4 and 8. Six 
additional states achieve better than average on three of four assessments. Of these, 
Pennsylvania and Washington show consistent improvement in reading (i.e., at both Grades 4 



* Exceptional improvement was defined as a z score equal to or greater than 1.0, compared to the 50 state, plus 
DC sample. 
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and 8); and Arkansas, New Jersey, Ohio, and Texas in math. Idaho shows eonsistent 
improvement at Grade 4 in both reading and mathematies. 

Using the Trial Urban Distriet Assessment as another proxy for how traditionally lower 
aehieving, poor and minority students are doing, results show greater improvement in 
mathematies than in reading, with 8 of the 10 partieipating distriets showing a statistieally 
signifieant inerease, ranging from seale points 4-9, from 2003 to 2005 in fourth grade 
mathematies (NAEP, 2005). Boston, Houston, Los Angeles and San Diego also showed a 
signifieant inerease in 8th-grade mathematies, with Los Angeles showing signifieant 
inereases aeross all four assessments. Mirroring weaker performanee trends in reading 
nationally, only Atlanta showed eonsistent inereases from 2002 to 2003 and 2005 in reading, 
while New York showed eonsistent inereases at Grade 4. It is interesting to note that the two 
distriets that performed highest relative to their peers, Charlotte and Austin, are the only two 
that did not show any signifieant inerease over the period. Beeause both Charlotte and Austin 
are in states that early on implemented strong aeeountability systems, one might suspeet that 
this laek of improvement may show a topping out of what ean be aeeomplished with 
traditional impaets of aeeountability, without dramatie ehanges in teaehing and learning 
praetiees, but then what explains Houston? And as noted above, Boston has been operating 
under Massaehusetts’ longstanding and stable aeeountability system. 

The two sets of NAEP analyses — status relative to SES and improvement in seores — 
uniquely identify Massaehusetts as a high performer but their state assessment results for the 
same period show more modest improvement relative to other states, and as one tries to 
eompare state assessment and NAEP results in other states, the patterns (or laek thereof) are 
puzzling. Moreover, Quality Counts identifies Massaehusetts as no. 49 in terms of the 
aehievement gap on NAEP between students who do and do not qualify for the federal free 
luneh program, even as results also show that the state is making progress in elosing the gap. 
It is also interesting to note that Massaehusetts started in 1997 as the highest amongst the 50 
states in their implementation of standards-based reform and has maintained its position over 
the years. Could it be that eonsisteney in poliey is a eontributing faetor, even as we know that 
relative wealth and early ehildhood indieators among many other variables also eontribute? 

Furthermore, whereas there is limited eonsisteney in what states are identified as high 
performers on NAEP aeross SES and improvement analyses, there is more eonsisteney in the 
under-performers. However, as we see in the next seetion, it was diffieult to differentiate 
these states based on features or qualities of their assessment or aeeountability systems. 
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Differentiating system characteristics using available indicators. If we believe that 
assessment and aeeountability ought to have benefits for student learning, then it stands to 
reason that the quality of the assessment and aeeountability system ought to matter. 
However, it is diffieult to get a handle on quality, given the depth and validity of existing 
indicators. Quality Counts data show more surface similarities than deep differences in 
current state systems. For example, virtually all states combine multiple-choice testing with 
an extended assessment of language arts (writing). Two thirds of the states also include open- 
ended items on their assessments; and half of the states include extended responses in 
subjects in addition to language arts. Most states claim they have developed customized tests 
relative to their standards, and almost all claim that they have done alignment studies. Of 
course, these are required under NCLB. But evidence from these alignment studies shows 
uneven quality. For some states these show major imbalances between standards and 
assessment and across states generally reveal a disproportionate representation of lower level 
skills relative to thinking and problem solving (e.g., Webb, 1999). 

Studies commissioned by the Fordham Foundation show a more varied picture of the 
quality of standards and assessment systems across states (Cross, Rebarber, Torres, & Finn, 
2004; Finn et ah, 2006; Klein et ah, 2005; Stotsky & Finn, 2005) and, as they use somewhat 
different criteria, it perhaps is not surprising that we found only modest relationships between 
the Fordham and Quality Counts ratings (.49 for ratings of standards in mathematics and .39 
for ratings in English, based on analyses of states standards in 2005). As a source of 
convergent validity evidence for existing indicators, the evidence then is scanty. 

To determine the quality of state tests, the Fordham study reviewed available 
documents and technical reports rather than relying on survey data, which was the source for 
Quality Counts. Rating state assessments in terms of their content, alignment of standards 
and assessment, academic rigor, and technical trustworthiness, the Fordham study found 
significant room for improvement, starting with the availability of materials from which state 
tests could be described and evaluated (Cross et ah, 2004). However, three states were 
distinguished in receiving high marks in three of six categories: Massachusetts, 
Pennsylvania, and Virginia. 

Effects on Special Populations 

We have noted above glimmerings that accountability is having an impact on low SES 
students — nationally and within states, students who qualify for free lunch and in general 
students served by large urban school districts. What of other special populations? The 
research base examining effects on students with disabilities and on English language learner 
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students is scanty. What we do know is that the validity and comparability of the state 
assessment results for these groups is suspect and thus it is hard to get a handle on the status 
and progress of their performance. The logic of using state assessment results to support the 
improvement of learning for these groups also is weak, given that the assessment results may 
well lack integrity. Nonetheless, advocates for these groups generally see the inclusion of 
special populations in state accountability systems as a plus, because it has made the 
educational needs of these students visible and mandated expectations and plans for progress 
in mainstream contexts that were too often without them, even if the prior contexts were 
deeply “caring.” At the same time proponents worry about accountability targets that are 
unrealistic and fear backlash when the performance of EL and/or SWD subgroups deter 
schools and districts from meeting AYP targets. 

What of the impact of accountability on other segments of the student population — 
traditionally higher performing students? On the gifted? The average student? From a 
measurement perspective, we know that it is difficult for a single test of limited duration to 
differentiate and/or motivate students at all points on the achievement spectrum. High ability 
students may be engaged with advanced placement (AP) and college entry exams, as well as 
in honors classes and gifted programs, which serves to motivate attention to their learning 
needs. But there is no obvious accountability mechanism for the “average student, “ who may 
have made it just over the proficient level. There is little research on this issue, but one might 
speculate that current federal accountability requirements need to do more to spur attention to 
the learning of “average” students, who represent the majority of students. 

Effects on Teachers 

While a thorough treatment of the effects on teachers is also beyond the scope of this 
report, it is worth noting a growing literature that is cause for concern. Research shows the 
strong relationship between student learning and the quality of teachers (Carey, 2004; 
Haycock, 1998; Sanders & Rivers, 1996) and the quality of interactions between teachers and 
students. Yet any number of survey studies have suggested that teachers believe that current 
accountability models are causing schools to focus too much on state tests and not pushing 
schools in educationally productive directions, which includes concerns that have been raised 
earlier about curricular distortion, neglect of complex thinking, and a focus on test format 
and test preparation rather than on effective pedagogy etc. (Hoffman, Assaf, & Paris, 2001; 
Jones & Egley 2004; Pedulla et al., 2003). These studies similarly raise questions about 
whether accountability is increasing student learning (rather than simply inflating scores on 
state tests) and about the potential negative effects of accountability on teacher morale and 
motivation. 
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Concerns about accountability effects on teacher morale and motivation in the current 
NCLB context are bolstered by both theory and empirical evidence. Expectancy theory 
(Vroom, 1964) suggests that motivation is a function of one’s perceived probability of 
success (expectancy), cormection of success and reward (instrumentality) and value of 
obtaining goals (valence). In other words, people are motivated by things that are desirable, 
that they know how to do and that they feel capable of achieving. Yet much been written 
about the feasibility of even the most effective schools achieving NCLB armual yearly 
progress goals and closing the achievement gap (Lirm, 2003; Rothstein, 2006). Research 
further shows that schools serving low performing students and students of color are least 
likely to be able to achieve these goals and have been the first to be subject to increasingly 
severe sanctions (Kim & Sunderman, 2005). Expectation theory would anticipate serious 
negative effects on motivation (and subsequent retention) in these settings where there is a 
low expectation of success. Indeed, the theory is supported by empirical evidence from a 
North Carolina study documenting the negative effects of strong accountability systems of 
low performing schools’ ability to retain teachers in general, and quality teachers in 
particular (Clotfelter, Ladd, Vigdor, & Diaz, 2004). 

If ultimately it is professional accountability — teachers’ day-to-day commitment to 
effective practice and their ongoing motivation and sense of pedagogical responsibility — that 
is most important in advancing student learning, then one must question the relationship 
between professional accountability and unrealistic bureaucratic accountability requirements. 

Summary and Conclusions 

So returning to the question of whether accountability is serving the public interest: 
Trite but true, the answer is complicated. Available evidence suggests that the theory of 
action underlying accountability is generally working. 

Support for theory of action. Accountability systems make public expectations and 
motivate educators and students to pay attention to learning and performance: Schools are 
changing what they are doing, they are focusing on teaching and learning and aligning 
curriculum and instruction with standards — or at least those that are tested. They are working 
to better use data to refine their programs and to identify students who are falling behind. 
Districts and schools are trying to expand available opportunities so that students will get the 
extra help they need to catch up — or at least be proficient in reading and mathematics tests. 
Administrators and teachers are paying better attention to and making plans to respond to and 
engage all their students, particularly traditionally low achieving, identifiable subgroups. 
Moreover, as we look to NAEP results as an external indicator of performance effects, we 
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find a modest relationship between strength and duration of aeeountability systems and 
improvement in student performanee, and we find small improvements in NAEP 
performanee for eoonomieally poor students and perhaps some small movement in elosing of 
the aehievement gap. 

Danger spots. Admittedly these effeets are quite modest eompared to the ehallenge of 
helping all ehildren reaeh their potential, and it must be aeknowledged that there are dangers 
here for our most vulnerable ehildren — for example, those neediest students who got left 
behind in Chieago’s promotion program and who were at greater risk of dropping out; 
students who do not pass high sehool exit exams required for diplomas and are more likely to 
drop out; the lowest ability students who may be ignored as sehools work to move students 
eloser to the profieieney level over the line. 

It is elear from the researeh that aeeountability is ehanging what gets taught, but 
whether the ehange represents real improvement in students’ opportunity to learn is moot. 
Researeh suggests a narrowing of the eurrieulum to foeus on what is tested on state 
assessments, and what gets tested as well as what is ineluded in standards, tends to over- 
represent lower level skills and give seant attention to higher level thinking and eomplex 
applieations. In responding to eurrent aeeountability systems, then, teaehers may be less 
likely to engage students — partieularly low performing and minority ehildren — in 
meaningful problem-solving and reasoning aetivities, or to build the skills that students will 
need for sueeess in the 21st eentury. Teaehers eeho this eoneem, believing that aeeountability 
is moving edueation in the wrong direetion. At the same time, unrealistie aeeountability 
targets may be diseouraging the best teaehers from teaehing in sehools with high proportions 
of aoademieally needy students. The potential eombination of meager eurrieulum and lesser 
quality teaehers in the long run eould inerease rather than deerease the real aehievement gap. 

Yet even against these dangers, perhaps modest positive effeets for most students 
should be viewed as an important aeeomplishment, even as we work to make the system 
work better for the most vulnerable and work to guard against unintended eurrieulum effeets. 
It is sad but true that expeeting something of all students, even if it is only what is tested, 
may in faet be an improvement for some. We ean and should do better. 

Toward better accountability systems for the public interest. Accountability 
systems are most apt to serve the public interest when they are designed to maximize benefits 
and minimize negative effects. To maximize the benefits, this report suggests the importance 
of assuring that: 
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• Standards clearly communicate realistic expectations and represent the knowledge 
and skills that students will need for future success. One of the reasons that schools 
may teach to the test is that it is the only concrete guidance they have about learning 
expectations. Standards must provide a solid foundation for assessment, instruction 
and accountability systems and focus on meaningful learning. 

• Accountability systems reflect the full depth and breadth of standards and encourage 
good educational practice. Clearly, on demand, annual state tests of limited duration 
earmot measure all that is important for students to know and be able to do. 
Measurement theory and policy analysis suggests the value of multiple measures 
(AERA, APA, NCME, 1999; Darling-Hammond et ah, 2005). Assessments or 
simply accountability requirements that students be engaged in meaningful 
problem-solving and reasoning tasks could help to ameliorate current imbalances. 
Rather than simply mimicking state tests, benchmark assessments could be used to 
expand the depth and breadth of standards coverage and could be embedded in 
meaningful curriculum activities. 

• Performance expectations are suitably high yet attainable. Grossly unrealistic 
performance expectations that carry sanctions are counterproductive to good 
teaching and learning. They discourage teacher motivation and encourage testing to 
the test. Bob Eirm (2003) has suggested using the trajectories of the fastest 
improving schools as a starting point for setting reasonable targets. Accountability 
models that credit schools for improvement in student performance at each levels of 
the proficiency continuum, (e.g., from below basic to basic, from proficient to 
advanced), could help assure that schools do not ignore their lowest achieving or 
average students. 

Yet even with the most optimal system, there are limits to what accountability alone 
ean accomplish. Accountability systems can provide motivation, evidence, and a target for 
action, but effective action depends on educators’ capacity. If educators already knew how to 
respond to the needs of their most challenging or all their students, they would be doing so. 
Available evidence cited here suggests that educators are trying, but without dramatic 
success. Without continued investment in capacity building and resources to improve 
teaching and learning, there can be little closing of the achievement gap. 

Even so, there is only so much that public schools can do to close an achievement gap 
that grows out of greater social and historical inequities. As Richard Rothstein (2006, p. 1) 
has observed: 

If as a society we choose to preserve big social class differences, we must necessarily 
also accept substantial gaps between the achievement of lower-class and middle-class 
children. Closing those gaps requires not only better schools, although those are certainly 
needed, but also reform in the social and economic institutions that prepare children to 
learn in different ways. It will not be cheap. 
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So, is accountability serving the public interest? I say yes — although it is not a 
resounding success and clearly we can do better. We need more safeguards in the system to 
guard against the potentially deadening effeets of aeeountability and to stimulate the 
empowerment and effieacy it ean bring when we as edueators make a differenee. We need to 
continue to ask: Are our systems in the publie interest? For whom are they working, for 
whom not? How do we know? How can we optimize? Research and development (R&D) and 
capaeity building must eontinue. But at the same time, we need to be honest about what 
aeeountability systems ean and eannot aecomplishing in helping all ehildren to succeed. 
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